Partial Training for a Lexicalized-Grammar Parser
نویسندگان
چکیده
We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, resulting in high precision, yet incomplete and noisy data. The dependency parsing model of Clark and Curran (2004b) is extended to exploit this partial training data. Remarkably, the accuracy of the parser trained on data derived from category sequences alone is only 1.3% worse in terms of F-score than the parser trained on complete dependency structures.
منابع مشابه
Combining Supertagging and Lexicalized Tree-Adjoining Grammar Parsing∗
In this paper we study various reasons and mechanisms for combining Supertagging with Lexicalized Tree-Adjoining Grammar (LTAG) parsing. Because of the highly lexicalized nature of the LTAG formalism, we experimentally show that notions other than sentence length play a factor in observed parse times. In particular, syntactic lexical ambiguity and sentence complexity (both are terms we define i...
متن کاملPerceptron Training for a Wide-Coverage Lexicalized-Grammar Parser
This paper investigates perceptron training for a wide-coverage CCG parser and compares the perceptron with a log-linear model. The CCG parser uses a phrase-structure parsing model and dynamic programming in the form of the Viterbi algorithm to find the highest scoring derivation. The difficulty in using the perceptron for a phrase-structure parsing model is the need for an efficient decoder. W...
متن کاملWide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in t...
متن کاملSelf-Training PCFG Grammars with Latent Annotations Across Languages
We investigate the effectiveness of selftraining PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak’s lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from selftraining. We show for the first t...
متن کاملA Structured Language Model Based on Context-Sensitive Probabilistic Left-Corner Parsing
Recent contributions to statistical language modeling for speech recognition have shown that probabilistically parsing a partial word sequence aids the prediction of the next word, leading to “structured” language models that have the potential to outperform n-grams. Existing approaches to structured language modeling construct nodes in the partial parse tree after all of the underlying words h...
متن کامل